Prepare Dataset

Monthly CNG Usage data is divided by the number of meters and the area of the building. The dataset we use covers data from 2017 till 2022. The data is log-transformed after adding a small positive constant: the 1% quantile of positive values in the dataset which is \(3.5\times10^{-5}\).

Clustering Using Kmeans Method, with C-index and Silhouette Index

The number of clusters are selected by silhouette index (from -1 to 1 and the closer to 1 the better) and C-index (The smaller the better). There are some missing values in the result of silhouette index because if some cluster including only ONE account exists, the calculation of silhouette index would return NaN. (Divided by 0 in the procedure)

Plot of 2 clusters

## [1] "The Silhouetteindex is 0.226. The number of cluster is 2"
## [1] "The C-index is 0.356. The number of cluster is 2"

04000105887448 MANSFIELD APTS BLDG 13 (4 UNITS) 04000105674440 HILLTOP APT-COMMUNITY CENTER BLDG#3

Plot of 3 clusters

## [1] "The Silhouetteindex is 0.304. The number of cluster is 3"
## [1] "The C-index is 0.08. The number of cluster is 3"

Plot of 4 clusters

## [1] "The Silhouetteindex is 0.224. The number of cluster is 4"
## [1] "The C-index is 0.124. The number of cluster is 4"

Plot of 5 clusters

## [1] "The Silhouetteindex is 0.224. The number of cluster is 5"
## [1] "The C-index is 0.108. The number of cluster is 5"

Plot of 6 clusters

## [1] "The Silhouetteindex is 0.226. The number of cluster is 6"
## [1] "The C-index is 0.076. The number of cluster is 6"

Plot of 7 clusters

## [1] "The Silhouetteindex is 0.197. The number of cluster is 7"
## [1] "The C-index is 0.067. The number of cluster is 7"

Plot of 8 clusters

## [1] "The Silhouetteindex is 0.174. The number of cluster is 8"
## [1] "The C-index is 0.071. The number of cluster is 8"

Plot of 9 clusters

## [1] "The Silhouetteindex is 0.168. The number of cluster is 9"
## [1] "The C-index is 0.067. The number of cluster is 9"